{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "true"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### pry-docの読み込み\n",
    "#require \"/root/git_jupyter_notebook/Ruby/vendor/bundle/ruby/2.3.0/gems/pry-doc-0.10.0/lib/pry-doc\"\n",
    "require \"/Users/ftakao2007/jupyter/vendor/bundle/ruby/2.3.0/gems/pry-doc-0.10.0/lib/pry-doc\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "# Regexpクラス\n",
    "\n",
    "* 正規表現オブジェクトを扱うクラス\n",
    "* 正規表現を使って文字列やデータマッチングを行う"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## 正規表現オブジェクトを生成する\n",
    "\n",
    "* 正規表現リテラルを使って表現できる\n",
    "    * 末尾にオプションを指定できる\n",
    "        * i : 大文字と小文字を無視する\n",
    "        * m : 正規表現の「.」で改行にマッチさせる\n",
    "        * x : 空白や#から始まるコメントを無視する\n",
    "        * オプションは複数指定できる\n",
    "* Regexp.new, Regexp.compileメソッドで生成できる\n",
    "    * 二つ目の引数に、マッチングのオプションを指定できる\n",
    "        * Regexp::IGNORECASE : 大文字と小文字を無視する\n",
    "        * Regexp::MULTILINE  : 正規表現の「.」で改行にマッチさせる\n",
    "        * Regexp::EXTENDED : バックスラッシュでエスケープされていない空白や#から始まるコメントを無視する"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Regexp\n"
     ]
    }
   ],
   "source": [
    "a = /abc/\n",
    "puts a.class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "大文字小文字を区別しない\n",
      "false\n",
      "true\n",
      "\n",
      "改行をマッチさせる\n",
      "false\n",
      "true\n",
      "\n",
      "空白やコメントを無視する\n",
      "false\n",
      "true\n",
      "\n",
      "複数指定\n",
      "true\n"
     ]
    }
   ],
   "source": [
    "puts \"大文字小文字を区別しない\"\n",
    "a = /abcde/\n",
    "puts a === \"AbCDe\"\n",
    "b = /abcde/i\n",
    "puts b === \"AbCDe\"\n",
    "puts \n",
    "\n",
    "puts \"改行をマッチさせる\"\n",
    "c = /ab.*cd/\n",
    "puts c === \"ab\\ncd\"\n",
    "d = /ab.*cd/m\n",
    "puts d === \"ab\\ncd\"\n",
    "puts\n",
    "\n",
    "puts \"空白やコメントを無視する\"\n",
    "e = /ab\n",
    "     cde\n",
    "     # コメント\n",
    "     fg/\n",
    "puts e === \"abcdefg\"\n",
    "\n",
    "f = /ab\n",
    "     cde\n",
    "     # コメント\n",
    "     fg/x\n",
    "puts f === \"abcdefg\"\n",
    "puts\n",
    "\n",
    "puts \"複数指定\"\n",
    "g = /ab.*cd/im\n",
    "puts g === \"Ab\\ncD\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "false\n",
      "true\n",
      "true\n",
      "true\n"
     ]
    }
   ],
   "source": [
    "### 「*」は直前にある文字の0回以上の繰り返し\n",
    "### 空白の直後に「10.1.235.71」が続かないとfalse\n",
    "puts /ipv4.dns:\\s*10.1.235.71/ === \"ipv4.dns:                               10.1.236.71,10.1.235.71\"\n",
    "\n",
    "### 「.*」は直前にある文字(任意の文字)の0回以上の繰り返し\n",
    "### 任意の文字の後に「10.1.235.71」があればtrue\n",
    "puts /ipv4.dns:\\s.*10.1.235.71/ === \"ipv4.dns:                               10.1.236.71,10.1.235.71\"\n",
    "\n",
    "puts /export http_proxy=\"http:\\/\\/\\$PROXY\"/ === 'export http_proxy=\"http://$PROXY\"'\n",
    "\n",
    "puts /User/ === 'Run as another user (clamd must be started by root for this option to work)'\n",
    "\n",
    "### 「HA(0個以上の数値).xls」\n",
    "puts /HA[0-9]*\\.xls/ == \"HA.xls\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "大文字小文字を区別しない\n",
      "false\n",
      "true\n",
      "\n",
      "改行をマッチさせる\n",
      "false\n",
      "true\n",
      "\n",
      "空白やコメントを無視する\n",
      "false\n",
      "true\n",
      "\n",
      "複数指定\n",
      "true\n"
     ]
    }
   ],
   "source": [
    "puts \"大文字小文字を区別しない\"\n",
    "a = Regexp.new(\"abcde\")\n",
    "puts a === \"AbCDe\"\n",
    "b = Regexp.new(\"abcde\",Regexp::IGNORECASE)\n",
    "puts b === \"AbCDe\"\n",
    "puts \n",
    "\n",
    "puts \"改行をマッチさせる\"\n",
    "c = Regexp.new(\"ab.*cd\")\n",
    "puts c === \"ab\\ncd\"\n",
    "d = Regexp.new(\"ab.*cd\",Regexp::MULTILINE)\n",
    "puts d === \"ab\\ncd\"\n",
    "puts\n",
    "\n",
    "puts \"空白やコメントを無視する\"\n",
    "e = Regexp.new(\"ab\n",
    "                cde\n",
    "                # コメント\n",
    "                fg\")\n",
    "puts e === \"abcdefg\"\n",
    "\n",
    "f = Regexp.new(\"ab\n",
    "                cde\n",
    "                # コメント\n",
    "                fg\",Regexp::EXTENDED)\n",
    "puts f === \"abcdefg\"\n",
    "puts\n",
    "\n",
    "puts \"複数指定\"\n",
    "g = /ab.*cd/im\n",
    "g = Regexp.new(\"ab.*cd\",Regexp::IGNORECASE | Regexp::MULTILINE)\n",
    "puts g === \"Ab\\ncD\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## 正規表現オブジェクトでマッチングする\n",
    "\n",
    "* メソッド\n",
    "    * match\n",
    "        * 正規表現オブジェクトでマッチングする\n",
    "        * マッチした場合はMatchDataオブジェクトを返す\n",
    "        * マッチしなかった場合はnilを返す\n",
    "    * =~\n",
    "        * マッチした場合はマッチした位置のindexが返る\n",
    "        * マッチしなければnilが返る\n",
    "        * Stringクラスのオブジェクトにもあるメソッドで同様の動作をする\n",
    "    * ===\n",
    "        * マッチするとtrueが返る\n",
    "        * マッチしなければfalseが返る\n",
    "    * ~\n",
    "        * 特殊変数「$_」とマッチングするメソッド\n",
    "        * マッチすると0を返す\n",
    "        * マッチしなければnilを返す"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#<MatchData \"abc\">\n",
      "2\n",
      "true\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = Regexp.new(\"abc\")\n",
    "p a.match(\"abcde\")\n",
    "\n",
    "b = Regexp.new(\"cde\")\n",
    "puts \"abcde\" =~ b\n",
    "\n",
    "c = Regexp.new(\"abc\")\n",
    "puts c === \"abcde\"\n",
    "\n",
    "$_ = \"xyz\"\n",
    "d = Regexp.new(\"x\")\n",
    "~ d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## 正規表現の特殊文字をエスケープする\n",
    "\n",
    "* メソッド\n",
    "    * Regexp.escape, Regexp.quota\n",
    "        * ピリオド(.)やカッコ([]など)をエスケープする"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"a\\\\.\\\\*b\\\\[cd\\\\]\""
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Regexp.escape(\"a.*b[cd]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## マッチした結果を取得する\n",
    "\n",
    "* メソッド\n",
    "    * Regexp.last_match\n",
    "        * 整数値を与えると、該当のマッチ文字列が得られる\n",
    "            * 0\n",
    "                * 正規表現にマッチした文字列\n",
    "                * 特殊変数「`$&`」と同じ\n",
    "            * それ以降の整数\n",
    "                * カッコにマッチした部分文字列\n",
    "                * 特殊変数「`$1`, `$2`, ...」と同じ\n",
    "                "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "34abcdefghijklmnopqrst\n",
      "34abcdefgh\n",
      "jklmnopqrs\n",
      "nil\n",
      "\n",
      "特殊変数\n",
      "12\n",
      "34abcdefghijklmnopqrst\n",
      "uvwxyz\n",
      "34abcdefgh\n",
      "jklmnopqrs\n",
      "nil\n"
     ]
    }
   ],
   "source": [
    "/(3.*h)i(j.*s)t/ =~ \"1234abcdefghijklmnopqrstuvwxyz\"\n",
    "puts Regexp.last_match(0)\n",
    "puts Regexp.last_match(1)\n",
    "puts Regexp.last_match(2)\n",
    "p Regexp.last_match(3)\n",
    "puts\n",
    "\n",
    "puts \"特殊変数\"\n",
    "puts $`\n",
    "puts $&\n",
    "puts $'\n",
    "puts $1\n",
    "puts $2\n",
    "p $3\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## 正規表現の論理和を求める\n",
    "\n",
    "* メソッド\n",
    "    * options\n",
    "        * 正規表現オブジェクトを生成するときに設定したオプションの論理和を返す\n",
    "            * Regexp::IGNORECASE : 大文字と小文字を無視する\n",
    "            * Regexp::MULTILINE  : 正規表現の「.」で改行にマッチさせる\n",
    "            * Regexp::EXTENDED : バックスラッシュでエスケープされていない空白や#から始まるコメントを無視する\n",
    "    * casefold?\n",
    "        * Regexp::IGNORECASEが設定されているかをどうかを返す\n",
    "    * kcode\n",
    "        * 正規表現オブジェクトがコンパイルされている文字コードを`$KCODE`と同じ形式で返す\n",
    "        * 特定の文字コードに対してコンパイルされていない場合はnilが返る\n",
    "            * マッチ時点で`$KCODE`を用いる場合など\n",
    "        * <font color=\"red\">Ruby1.9で廃止されている？</font>\n",
    "    * source\n",
    "        * 正規表現の元となった文字列を返す\n",
    "            * 同様の文字列表現を返すメソッド\n",
    "                * to_s\n",
    "                    * 他の正規表現に埋め込んでも元の意味が保たれる形式\n",
    "                * inspect\n",
    "                    * to_sよりも自然な形式の文字列になるが元の意味は保たれない"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n",
      "\n",
      "Regexp::IGNORECASE\n",
      "0\n",
      "Regexp::EXTENDED\n",
      "2\n",
      "Regexp::MULTILINE\n",
      "0\n",
      "false\n"
     ]
    }
   ],
   "source": [
    "#a = Regexp.new(\"ab.*cd\",Regexp::IGNORECASE | Regexp::MULTILINE)\n",
    "a = Regexp.new(\"ab.*cd\",Regexp::EXTENDED)\n",
    "puts a.options\n",
    "puts\n",
    "puts \"Regexp::IGNORECASE\"\n",
    "puts a.options & Regexp::IGNORECASE\n",
    "puts \"Regexp::EXTENDED\"\n",
    "puts a.options & Regexp::EXTENDED\n",
    "puts \"Regexp::MULTILINE\"\n",
    "puts a.options & Regexp::MULTILINE\n",
    "puts a.casefold?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "abcdefg\n",
      "(?i-mx:abcdefg)\n",
      "/abcdefg/i\n"
     ]
    }
   ],
   "source": [
    "a = Regexp.new(\"abcdefg\",'u')\n",
    "#a.kcode\n",
    "puts a.source\n",
    "puts a.to_s\n",
    "puts a.inspect"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 覚書\n",
    "\n",
    "htmlから特定のアドレスを取得する\n",
    "\n",
    "```\n",
    "0> messages[4].payload.body.data.match(%r{(\"http)://pmrd(.*?\")})[0]\n",
    "=> \"\\\"http://pmrd.rakuten.co.jp/?r=KhT7ALEOFWA%3D&p=EDUMMY1&u=9FFDUMMY\\\"\"\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Ruby 2.3.3",
   "language": "ruby",
   "name": "ruby"
  },
  "language_info": {
   "file_extension": ".rb",
   "mimetype": "application/x-ruby",
   "name": "ruby",
   "version": "2.3.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
